Matthew Ng, Maxim Chernetskiy, Andrew MacLachlan
7/22/2019
These are all the required libraries required for the next three remote-sensing tutorials.
library(sp)
library(RColorBrewer)
library(lattice)
library(latticeExtra)
library(raster)
library(rgeos)
library(rgdal)
library(rasterVis)
library(ggplot2)
library(RStoolbox)
library(gridExtra)
library(fpc)
library(cluster)
library(dismo)
library(rpart)
library(randomcoloR)
library(randomForest)
library(plotly)
library(tidyverse)
library(htmlwidgets)There are several places that remote sensing data can be obtained. Depending on your coverage requirements, they may be available in various formats, resolutions, and sources.
For images in the USGS Landsat programme, my favourite repositories are:
You will need to create an account download the data should you wish to extend this tutorial to another area.
N.B. Bearing in mind the intended study area, these files can quickly become very large, and your workflow should be adapted to this.
For the purposes of this tutorial, the data we are going be used is available here:
https://drive.google.com/open?id=1NPUAqw1VJ6_jUNfp7srtmdMg4Oh10CWr
Go ahead and download this folder and save this into your local machine.
Then set the working directory to be the folder with the data you have just downloaded
root_dir <- "S:\\CASA_obits_ucfnoth\\4. Collaborations\\SS-ASM19-Materials\\Remote Sensing\\data\\"
knitr::opts_knit$set(root.dir = root_dir)The raster data that we will be using today is of the Manchester metropolitan area taken from the United States Geological Survey Landsat 8 OLI/TIR satellite.
There are a total of 11 bands, saved in .TIFF format. Each sensor (band) captures different parts of the electromagnetic spectrum; and, what is captured heavily depends on what is reflected by the objects on the surface of the earth.
The data is presented in a 30m by 30m resolution for most bands; except, in Band 8, which is called the panchromatic band represented in 15m by 15m cells; and, Band 10 and Band 11, which cover a larger scale of 100m x 100m.
The table below gives the wavelengths and resolutions of the various bands that make up a Landsat 8 image. We will refer back to this table in the following exercises over the following tutorials.
## OGR data source with driver: ESRI Shapefile
## Source: "S:\CASA_obits_ucfnoth\4. Collaborations\SS-ASM19-Materials\Remote Sensing\data\manchester_boundary.shp", layer: "manchester_boundary"
## with 1 features
## It has 5 fields
## Integer64 fields read as strings: OBJECTID
# Load our Landsat Data. We will be using Landsat 8's Band 1 to 8 in this practical.
manc_error <- paste0("LC08_L1TP_203023_20190513_20190521_01_T1_B", 1:8, ".tif")
# There is however an error with this dataset in that the panchromatic Band 8 does not fully align with the extent of the other raster layers. This prevents us from stacking all 8 bands into a single composite, as shown:
stack(manc_error)## class : RasterStack
## dimensions : 7981, 7891, 62978071, 8 (nrow, ncol, ncell, nlayers)
## resolution : 30, 30 (x, y)
## extent : 451185, 687915, 5763885, 6003315 (xmin, xmax, ymin, ymax)
## crs : +proj=utm +zone=30 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0
## names : LC08_L1TP//1_01_T1_B1, LC08_L1TP//1_01_T1_B2, LC08_L1TP//1_01_T1_B3, LC08_L1TP//1_01_T1_B4, LC08_L1TP//1_01_T1_B5, LC08_L1TP//1_01_T1_B6, LC08_L1TP//1_01_T1_B7, LC08_L1TP//1_01_T1_B8
## min values : 0, 0, 0, 0, 0, 0, 0, 0
## max values : 65535, 65535, 65535, 65535, 65535, 65535, 65535, 65535
The error with extents is quite common in RS analysis. There are several ways to go about to rectify this. One method is to sharpen all bands with Band 8. The other method is to resample Band 8 with the other bands used in our composite.
Both ways take a little bit of time, but they are to ensure consistency throughout.
We may now want to think about the pros and cons of one method over the other in your own research.
The panchromatic sharpening can be used with the panSharpen() function in the RStoolbox library.
In this tutorial, we are going to be using the resample method with the code below.
# So now, let's fix this extent issue.
# First, let's load the raster stack from just B1 to B7
manc_files <- paste0("LC08_L1TP_203023_20190513_20190521_01_T1_B", 1:7, ".tif")
manchester <- stack(manc_files)
plot(manchester)# To resample, the following code should do the trick
# b8list <- list.files("C:/Users/mattk/OneDrive/Desktop/rs",pattern=".*LC08_L1TP_203023_20190513_20190521_01_T1_B[8]\\.tif$", ignore.case=TRUE, full.names=TRUE)
# b8 <- raster(b8list)
# ngb is the ```nearest neighbour``` method. This will take some time.
# b1 <- manchester$LC08_L1TP_203023_20190513_20190521_01_T1_B1
# b8 <- resample(b8, b1, method='ngb')
# Let's overwrite B8 with the new extent in our workspace.
# b8 <- writeRaster(b8, "LC08_L1TP_203023_20190513_20190521_01_T1_B8.tif", overwrite=TRUE)Great! Let’s re-stack of the raster layers back into one clean, single composite..
manc_full <- manchester # typically, I would addLayer(Manchester, b8)
# Let's compare the extents now..
# compareRaster(manc_full$LC08_L1TP_203023_20190513_20190521_01_T1_B1,manc_full$LC08_L1TP_203023_20190513_20190521_01_T1_B8)We are now going to clip the raster to the Manchester metropolitan area. The current raster covers a very large area, extending all the way down to the midlands. As such, it is a very large file to work with and may not be necessary given your intended research. For the purposes of this tutorial, we only want the Manchester metropolitan area. To get this area, we are going to clip and mask the manc_full raster with the manchester_shape layer.
# Here are some other codes to look at various aspects of the composite's metadata
crs(manc_mask) # projection## CRS arguments:
## +proj=utm +zone=30 +datum=WGS84 +units=m +no_defs +ellps=WGS84
## +towgs84=0,0,0
## class : Extent
## xmin : 540915
## xmax : 558675
## ymin : 5917125
## ymax : 5933115
## [1] 315536
## [1] 533 592 7
## [1] 7
## [1] 30 30
# The above data however don't tell use about what is being captured within each band. For that, we can look at their histograms.
hist(manc_mask,
xlab = "spectral response",
breaks = 30)
# We can look at how they relate to each other
pairs(manc_mask[[1:7]])Let’s now have actually visualise our data. By manipulating which bands to display, we can take advantage of the resulting colours to highlight certain aspects of our raster.
Let’s set out the code for the a True Colour Composite which is what we naturally see with our eyes. Other published combinations can be seen in the following table:
# With ggplot
ggRGB(manc_mask, r=4, g=3, b=2, stretch = "lin")+
ggtitle("Natural Colour\n (R= Red, G= Green, B= Blue)") +
xlab("long") +
ylab("lat") + theme_light()# Using the combinations above you could try to show several other band combinations on your own time
ggRGB(manc_mask, r=5, g=4, b=3, stretch = "lin")+
ggtitle("False Colour") +
xlab("long") +
ylab("lat") + theme_light()ggRGB(manc_mask, r=5, g=6, b=7, stretch = "lin")+
ggtitle("Bands 5,6 and 7") +
xlab("long") +
ylab("lat") + theme_light()Up to now, we’ve just been looking at raster the rasters at quite a superficial level of detail. Going back to the first table, we already know that different bands represent different parts wavelengths captured by the electromagnetic spectrum.
Therefore, if we know that that vegetation typically absorb within the Red band, but reflect within the Near-Infrared bands, we can begin to think of ways to extract vegetation from the image using raster formulas:
\[ Veg = \frac{NIR-Red}{NIR+Red}\] Here is a general function to do this calculation
nd <- function(img, a, b) {
ba <- img[[a]]
bb <- img[[b]]
i <- (ba - bb) / (ba + bb)
return(i)
}
# Let's plug in the numbers for the Vegetation.
ndvi <- nd(manc_mask, 5, 4)
# Let's look at the plot
plot(ndvi, col = rev(terrain.colors(10)), main = "Manchester-NDVI")# Let's look at the histogram for this dataset
hist(ndvi, breaks = 40, main = "NDVI Histogram", xlim = c(-.3,.8))## We can reclassify to the raster to show use what is most likely going to vegetation based on the histogram
## Typically you can cbind up to the 3rd quartile as a rule of thumb
veg <- reclassify(ndvi, cbind(-Inf, 0.3, NA))
plot(veg, main = 'Possible Veg cover')## Let's look at this in relation to Manchester as a whole
plotRGB(manc_mask, axes = TRUE, stretch = "lin", main = "Landsat True Color Composite")
plot(veg, add=TRUE, legend=FALSE)As an extension, we can try these same calculations for various other elements in our raster. For example, the formulae for the built environment or water bodies can be expressed below. You can take these as extension task in your own free time.
\[ built = \frac{SWIR1 - NIR}{SWIR1 + NIR} \] \[ water = \frac{Green - SWIR1}{Green + SWIR1} \]
These masks are the simplest way of attaining quick visualisations of your raster dataset.
In the next tutorial, Maxim will take this idea to the next level to seperate not just vegetation from the Manchester scene, but also to identify different species of plants within these scenes.
This idea can be extended to built-environment if we know what surface materials are being used. For example, in zinc sheets are used in roofs in many informal settlements in Asia, we can then calculate growth of these informal settlements using the right mathematical equation. HINTS FOR CLEAN GROWTH THURST
Classification in remote sensing is the process of arranging pixels into group of classes. Classification can be unsupervised (clustering) or supervised.
Unsupervised classification means what we don’t know "“truth” about an image but we can group pixels into classes according to their spectral properties. There are many unsupervised methods which have their advantages for different types of data. In the figure below you can see example of application of some of these methods to synthetic datasets (https://scikit-learn.org/stable/modules/clustering.html).
We can see that different methods work differently depending on patterns. The interesting case in the last row which shows homogenious distribution. Only three of nine methods group all pixels into one class.
We can apply some of these methods to our landsat 8 Manchester scene.
We see that KMeans and GaussianMixture give noisy but resonable results (compare to RGB) and DBSCAN producing most noisy results. Method Birch looks less noisy and gives distinct spatial patterns. However, the last method takes longest time to run. This method isn’t available in R and we will not use it.
K-means is one of most popuar unsupervised classification methods (Tou and Gonzalez, 1974; Fränti and Sieranoja, 2019). We can see to the method through the cost function which looks like:
\(J=\sum\limits_{i=1}^n \min\limits_{j \in \{i..k\}}||x_i-c_j||^2\)
This means that we adjust n pixels into k classes by minimization of the cost function J. \(c_j\) where \(j=1..k\) are centroids. Distance to centroids determines classes. Centroids firstly initialized randomly and then adjasting according to cost function J.
Let’s give names to the Landsat bands
Open the Landsat image and crop it to the area of Manchester.
# Create list of bands file paths
lst_names <- manc_error
lst_names <- lst_names[c(1,2,3,4,5,6,7)]
b_stack0 <- stack(lst_names)
lst_names## [1] "LC08_L1TP_203023_20190513_20190521_01_T1_B1.tif"
## [2] "LC08_L1TP_203023_20190513_20190521_01_T1_B2.tif"
## [3] "LC08_L1TP_203023_20190513_20190521_01_T1_B3.tif"
## [4] "LC08_L1TP_203023_20190513_20190521_01_T1_B4.tif"
## [5] "LC08_L1TP_203023_20190513_20190521_01_T1_B5.tif"
## [6] "LC08_L1TP_203023_20190513_20190521_01_T1_B6.tif"
## [7] "LC08_L1TP_203023_20190513_20190521_01_T1_B7.tif"
band_stack <- crop(b_stack0, extent(b_stack0, 2314, 2814, 3047, 3547))
names(band_stack) <- band_names
#plot all bands
plot(band_stack, legend = FALSE, main=band_names, col=grey(1:100/100))Refering back to the first practical, et’s see RGB image made by different combination of spectral bands. This will help us to interprete results of classification.
We can see that vegetation is easier to recognize on the false color composite when Near Infrared (NIR) band is placed into red chanel. Healthy vegetation absorbes radiation in the Red range and strongly reflects in NIR. RGB combination of infra red bands shows vegetation as orange becouse Short Wave Infrared (SWIR) bands related to leaf water content. The difference with false color is that we can regonize not just vegetation but also different types of vegetation according to its water content.
Convert stack of 2D matrices to an array of vectors becouse classification code doesn’t work with 2D arrays.
Let’s see RGB image made by different combination of spectral bands. This will help us to interprete results of classification.
Convert stack of 2D matrices to an array of vectors becouse classification code doesn’t work with 2D arrays.
Look at the dimentionality of that vector.
## int [1:251001, 1:7] 10199 10160 10092 10096 10153 10164 10144 10088 10028 10018 ...
## - attr(*, "dimnames")=List of 2
## ..$ : NULL
## ..$ : chr [1:7] "ultra.blue" "blue" "green" "red" ...
Set seed
Do the classification!
# Here we ommiting not a number values by na.omiy, set up number of centroids toten, maximum number of iteration to 500.
kmncluster <- kmeans(na.omit(b_vec), centers = 10, iter.max = 500, nstart = 5, algorithm="Lloyd")Look at the return value (str)
## List of 9
## $ cluster : int [1:251001] 10 10 10 10 6 5 7 7 7 7 ...
## $ centers : num [1:10, 1:7] 11013 12006 15991 10828 10516 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:10] "1" "2" "3" "4" ...
## .. ..$ : chr [1:7] "ultra.blue" "blue" "green" "red" ...
## $ totss : num 5.28e+12
## $ withinss : num [1:10] 7.74e+10 8.57e+10 1.43e+11 6.86e+10 6.54e+10 ...
## $ tot.withinss: num 8.15e+11
## $ betweenss : num 4.46e+12
## $ size : int [1:10] 54703 20149 1512 24959 31665 34882 6802 7639 45677 23013
## $ iter : int 151
## $ ifault : NULL
## - attr(*, "class")= chr "kmeans"
Convert returned vector to 2D raster. Operator $ takes record “cluster” from returned list (look at previous chunk).
Set up colors for classes
The CLARA (Clustering LARge Application) algorithm is based on the Partition Around Medoids (PAM) algorithm which is an implementation of K-medoids algorithm (Kaufman and Rousseeuw, 2005).
The task of the K-medoids algorithm is to minimize the cost function \(J=min\sum\limits_{i=1}^n\sum\limits_{j=1}^nd(i,j)z_{i,j}\)
For the CLARA classifiction we use the same vector ‘b_vec’ as for the k-means classification
# Plot the results of both clustering methods
plot(knr, main = 'K-means classification', col = mycolor ) We ca see that CLARA method has separated different types of vegetation meanwhile K-Means grouped them into one cluster. Also CLARA better seoarated urban area. At the same time K-Means separated water to different cluster
Supervised clussification is one of the main tool in working with Remote Sensing data. There are a lot of different methods with their own advantages and disadvantages. In this exercise we will use Classification and Regression Trees (CART) (Lawrence, 2001) and Random Forest (Ho, 1998) methods.
Land Cover Map 2015 (LCM2015) will be used as reference data. LCM2015 is made using satellite data and cartographical information and provides land cover information for the UK.Spatial resolution 25m. LCM2015 has following classes:
0 Unclassified, 1 Broadleaved Woodland, 2 Coniferous Woodland, 3 Arable and Horticulture, 4 Improved Grassland, 5 Neutral Grassland, 6 Calcareous Grassland, 7 Acid grassland, 8 Fen, Marsh and Swamp, 9 Heather, 10 Heather grassland, 11 Bog, 12 Inland Rock, 13 Saltwater, 14 Freshwater, 15 Supra-littoral Rock, 16 Supra-littoral Sediment, 17 Littoral Rock, 18 Littoral sediment, 19 Saltmarsh, 20 Urban, 21 Suburban.
Setup class names and colors
# Get LCM data
lcm <- brick('lcm2015_landsat_stack.tif')
# Crop LCM data using row and column numbers:
lcm <- crop(lcm, extent(lcm, 2314, 2814, 3047, 3547))
names(lcm) <- c("lcm2015", "prob")
# The class names
class_names <- c("Broadleaved Woodland", "Coniferous Woodland", "Arable and Horticulture", "Improved Grassland", "Neutral Grassland", "Calcareous Grassland", "Acid grassland", "Fen, Marsh and Swamp", "Heather", "Heather grassland", "Bog", "Inland Rock", "Saltwater", "Freshwater", "Supra-littoral Rock", "Supra-littoral Sediment", "Littoral Rock", "Littoral sediment", "Saltmarsh", "Urban", "Suburban")
lcmclass <- c()
u_val = unique(lcm[[1]])
i = 1
for (v in u_val){
lcmclass <- append(lcmclass, class_names[v])
}
classdf <- data.frame(classvalue1 = c(0:8), classnames1 = lcmclass)## [1] 1 3 4 5 6 12 14 20 21
Ratify (RAT = “Raster Attribute Table”) the LCM2015. It defines RasterLayer as a categorical variable. Such a RasterLayer is linked to other values via a “Raster Attribute Table” (RAT).
lcm_rat <- lcm[[1]]
lcm_rat <- ratify(lcm_rat)
rat <- levels(lcm_rat)[[1]]
rat$landcover <- lcmclass
levels(lcm_rat) <- rat# Set the random number generator to reproduce the results
set.seed(99)
# Sampling by loading the training sites locations
samp <- sampleStratified(lcm_rat, size = 100, na.rm = TRUE, sp = TRUE)## Warning in .local(x, size, ...): only 7 cells found for stratum 6
## Warning in .local(x, size, ...): fewer samples than requested found for
## stratum: 6
## class : SpatialPointsDataFrame
## features : 807
## extent : 542580, 557580, 5918940, 5933910 (xmin, xmax, ymin, ymax)
## crs : +proj=utm +zone=30 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0
## variables : 2
## names : cell, lcm2015
## min values : 40, 1
## max values : 250449, 21
## < table of extent 0 >
We use color scheme provided by LCM in lcm2015gb25m.tif-Band_1.clr file
#cl = read.table('j:\\satellite\\RS_workshop\\LCM2015\\lcm-2015-25m_3034373\\lcm2015gb25m.tif-Band_1.clr', header = FALSE, sep = "", dec = ".")
#cl = read.table('LCM2015/lcm-2015-25m_3034373/lcm2015gb25m.tif-Band_1.clr', header = FALSE, sep = "", dec = ".")
cl = read.table('lcm2015gb25m.tif-Band_1.clr', header = FALSE, sep = "", dec = ".")
# make an empty vector
col_vector <- c()
# load normalized colors to col_vector
for (i in 2:dim(cl)[1]){
col_vector <- append(col_vector, rgb(cl[i, 2]/255, cl[i, 3]/255, cl[i,4]/255))
}# detach('package:tidyr')
# detach('package:tidyverse')
# detach('package:plotly')
# detach('package:ggplot2')
plt <- levelplot(lcm_rat, col.regions = col_vector[u_val], main = 'Training Sites')
print(plt + layer(sp.points(samp, pch = 3, cex = 0.5, col = "white")))## Error: Attempted to create layer with no geom.
## Error in UseMethod("extract_"): no applicable method for 'extract_' applied to an object of class "c('RasterBrick', 'Raster', 'RasterStackBrick', 'BasicRaster')"
# sampvals no longer has the spatial information. To keep the spatial information you use `sp=TRUE` argument in the `extract` function.
# drop the ID column
sampvals <- sampvals[, -1]## Error in eval(expr, envir, enclos): object 'sampvals' not found
# combine the class information with extracted values
sampdata <- data.frame(classvalue = samp@data$lcm2015, sampvals)## Error in data.frame(classvalue = samp@data$lcm2015, sampvals): object 'sampvals' not found
# Train the model
cart <- rpart(as.factor(classvalue)~., data=sampdata, method = 'class', minsplit = 5)## Error in is.data.frame(data): object 'sampdata' not found
## Error in plot(cart, uniform = TRUE, main = "Classification Tree"): object 'cart' not found
## Error in text(cart, cex = 0.8): object 'cart' not found
# Train the model
rforest <- randomForest(as.factor(classvalue)~., data=sampdata, method = 'class', minsplit = 5)## Error in eval(m$data, parent.frame()): object 'sampdata' not found
## Error in plot(rforest): object 'rforest' not found
CART Predict the subset data based on the model
## Error in inherits(model, "DistModel"): object 'cart' not found
## Error in ratify(pr): object 'pr' not found
Random Forest predict the subset data based on the model
## Error in .local(object, ...): object 'rforest' not found
## Error in ratify(pr_forest): object 'pr_forest' not found
## Error in levels(pr): object 'pr' not found
## Error in eval(expr, envir, enclos): object 'rat1' not found
## NULL
## Error in levels(pr_forest): object 'pr_forest' not found
class11_forest <- c()
i = 1
for (v in rat1_forest$ID){
class11_forest <- append(class11_forest, class_names[v])
}## Error in eval(expr, envir, enclos): object 'rat1_forest' not found
## NULL
Here we see that Random Forest has found more land cover classes than CART.
## Error in rat1$legend <- class11: object 'rat1' not found
## Error in eval(expr, envir, enclos): object 'rat1' not found
levelplot(pr, maxpixels = 1e6,
col.regions = col_vector[rat1$ID],
scales=list(draw=FALSE),
main = "CART classification of Landsat 8")## Error in levelplot(pr, maxpixels = 1e+06, col.regions = col_vector[rat1$ID], : object 'pr' not found
## Error in rat1_forest$legend <- class11_forest: object 'rat1_forest' not found
## Error in eval(expr, envir, enclos): object 'rat1_forest' not found
levelplot(pr_forest, maxpixels = 1e6,
col.regions = col_vector[rat1_forest$ID],
scales=list(draw=FALSE),
main = "Random Forest classification of Landsat 8")## Error in levelplot(pr_forest, maxpixels = 1e+06, col.regions = col_vector[rat1_forest$ID], : object 'pr_forest' not found
Here we can see that two classes which were found by Random Forest (‘Calcareous Grassland’ and ‘Neutral Grassland’) are very small and can be neglected. The class ‘Arable and Horticulture’ is located in the top left corner of the Training Sites image and it has been found by Random Forest around the same area. CART has claffified some urban ares as ‘Inland Rock’ whereas Random Forest correctly grouped these pixels as ‘Urban’.
## Error in eval(expr, envir, enclos): object 'sampdata' not found
## Error in kfold(sampdata11, k = 5, by = sampdata11$classvalue): object 'sampdata11' not found
## Error in table(j): object 'j' not found
## Error in eval(expr, envir, enclos): object 'sampdata11' not found
x <- list()
for (k in 1:5) {
train <- sampdata11[j!= k, ]
test <- sampdata11[j == k, ]
cart <- rpart(as.factor(classvalue)~., data=train, method = 'class', minsplit = 5)
pclass <- predict(cart, test, type='class')
# create a data.frame using the reference and prediction
x[[k]] <- cbind(test$classvalue, as.integer(pclass))
}## Error in eval(expr, envir, enclos): object 'sampdata11' not found
## Error in names(x) <- value: 'names' attribute [2] must be the same length as the vector [0]
## Error in names(dn) <- dnn: attempt to set an attribute on NULL
## Error in colnames(conmat) <- class11: object 'conmat' not found
## Error in rownames(conmat) <- class11: object 'conmat' not found
## Error in eval(expr, envir, enclos): object 'conmat' not found
## Error in eval(expr, envir, enclos): object 'conmat' not found
## function ()
## {
## from_context("..group_size")
## }
## <bytecode: 0x000000001cbc3b90>
## <environment: namespace:dplyr>
## Error in diag(conmat): object 'conmat' not found
## Error in sum(diag): invalid 'type' (closure) of argument
## Error in eval(expr, envir, enclos): object 'OA' not found
## Error in apply(conmat, 1, sum): object 'conmat' not found
## Error in eval(expr, envir, enclos): object 'rowsums' not found
## Error in apply(conmat, 2, sum): object 'conmat' not found
## Error in eval(expr, envir, enclos): object 'colsums' not found
## Error in eval(expr, envir, enclos): object 'p' not found
## Error in eval(expr, envir, enclos): object 'OA' not found
## function (z, ...)
## UseMethod("kappa")
## <bytecode: 0x00000000223d64b8>
## <environment: namespace:base>
## Error in eval(expr, envir, enclos): object 'colsums' not found
## Error in eval(expr, envir, enclos): object 'rowsums' not found
## Error in data.frame(producerAccuracy = PA, userAccuracy = UA): object 'PA' not found
## Error in eval(expr, envir, enclos): object 'outAcc' not found
Here we are going to compute temperature from Landsat data — there are many methods that can be found within literature to do so but we will use the one originally developed by Artis & Carnahan (1982), recently summarised by Guha et al. 2018 and and Avdan and Jovanovska (2016).
Some of the terms used our outlined in the remote sensing background section at the end of the document, so check back there if you get confused.
Calculate the Top of Atmopshere (TOA) spectral radiance from the Digital Number (DN) using:
\[\lambda= Grescale * QCAL + Brescale\]
TOA spectral radiance is light reflected off the Earth as seen from the satellite measure in radiance units.
In this equation Grescale and Brescale represent the gain and bias of the image, with QCAL the Digital Number (DN) — how the raw Landsat image is captured.
Grescale and Brescale are available from the .MTL file provided when you downloaded the Landsat data. Either open this file in notepad and extract the required values for band 10 gain (MULT_BAND) and bias (ADD_BAND)
…Or we can automate it using the MTL() function within the RStoolbox package.
library(ggplot2)
library(plotly)
MTL <- list.files(".\\",pattern="*.txt", ignore.case=TRUE, full.names = TRUE)
readMTL<-readMeta(MTL)
#To see all the attributes
readMTLIn this part of the practical, we are going to be using B10 of the Manchester scene.
e <- extent(manchester_shape)
manc_b10 <- raster("LC08_L1TP_203023_20190513_20190521_01_T1_B10.tif")
clip_b10 <- crop(manc_b10,e)
b10_mask <- mask(clip_b10, manchester_shape)
offsetandgain <- subset(readMTL$CALRAD, rownames(readMTL$CALRAD) == "B10_dn")Run the calculation using the band 10 raster layer
Next convert the TOA to Brightness Temperature \(T_b\) using the following equation:
\[T_b=\frac{K_2}{ln((K_1/\lambda)+1)}\]
Brightness temperature is the radiance travelling upward from the top of the atmosphere to the satellite in units of the temperature of an equivalent black body.
K1 (774.8853) and K2 (1321.0789) are pre launch calibration constants provided by USGS.
Instead of hardcoding these values…yep, you guessed it… we can extract them from our MTL
Calidata <- as.data.frame(readMTL$CALBT)
# subet the rows
Calidata <- subset(Calidata, rownames(Calidata) %in% "B10_dn")
# subset the columns
K1<-subset(Calidata, select=(K1))
K2<-subset(Calidata, select=(K2))
# get just the values out
K1<-K1$K1
K2<-K2$K2
# this would also work for K1
K1<-readMTL$CALBT$K1[1]
#for K2
K2<-readMTL$CALBT$K2[1]
Brighttemp<-(K2 / log((K1 / TOA) + 1))Earlier we calcualted NDVI, let’s use that to determine emissivity of each pixel.
First we need to calculate the fractional vegetation of each pixel, through the equation:
\[F_v= \left( \frac{NDVI - NDVI_{min}}{NDVI_{max}-NDVI_{min}} \right)^2\]
Fractional vegetation cover is the ratio of vertically projected area of vegetation to the total surface extent.
Here, \(NDVI_{min}\) is the minimum NDVI value (0.2) where pixels are considered bare earth and \(NDVI_{max}\) is the value at which pixels are considered healthy vegetation (0.5)
\[\varepsilon = 0.004*F_v+0.986\]
Emissivity is the ratio absorbed radiation engery to total incoming radiation engery compared to a blackbody (which would absorb everything), being ameasure of absoptivity.
Great, we’re nearly there… get our LST following the equation from Weng et al. 2004 (also summarised in Guja et al. (2018) and Avdan and Jovanovska (2016)):
\[LST= \frac{T_b}{1+(\lambda \varrho T_b / (p))ln\varepsilon}\]
Where:
\[p= h\frac{c}{\varrho}\]
Ok, don’t freak out….let’s start with calculating \(p\)
Here we have:
\(h\) which is Plank’s constant \(6.626 × 10^-34 Js\)
\(c\) which is the velocity of light in a vaccum \(2.998 × 10^8 m/sec\)
\(\varrho\) which is the Boltzmann constant of \(1.38 × 10^-23 J/K\)
Now for the rest of the equation….we have the values for:
\(\lambda\) which is the effective wavelength of our data (10.9 for Landsat 8 band 10)
\(\varepsilon\) emissivity
\(T_b\) Brightness Temperature
Run the equation with our data
#define remaining varaibles
lambda=1.09e-5
#run the LST calculation
LST <-Brighttemp/(1 +(lambda*Brighttemp/p)*log(emiss))
# check the values
LST## class : RasterLayer
## dimensions : 533, 592, 315536 (nrow, ncol, ncell)
## resolution : 30, 30 (x, y)
## extent : 540915, 558675, 5917125, 5933115 (xmin, xmax, ymin, ymax)
## crs : +proj=utm +zone=30 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0
## source : memory
## names : layer
## values : 281.4855, 305.9571 (min, max)
Are the values very high?… That’s because we are in Kevlin not degrees Celcius…let’s fix that and plot the map
Nice that’s our temperature data sorted.
Calucating urban area from Landsat data
How about we extract some urban area using another index and then see how our temperature data is related?
We will use the Normalized Difference Built-up Index (NDBI) algorithm for identification of built up regions using the reflective bands: Red, Near-Infrared (NIR) and Mid-Infrared (MIR) originally proposed by Zha et al. (2003).
It is very similar to our earlier NDVI calculation but using different bands…
\[NDBI= \frac{SWIR1-NIR}{SWIR1+NIR}\]
In Landsat 8 data the SWIR is band 6 and the NIR band 5
But do you remember our function? …Well this is the same calculation we used there just with different raster layers (or bands) so we could reuse it…
We could plot the varaibles agaisnt each other but there are a lot of data points
This is termed the overplotting problem. So, let’s just take a random subset of the same pixels from both raster layers.
To do so we need to again stack our layers
# stack the layers
computeddata=stack(LST,NDBI)
# take a random subset
random=sampleRandom(computeddata, 500, cells=TRUE)
# check the output
plot(random[,2], random[,3])Transfrom the data to a data.frame to work with ggplot, then plot
#convert to a data frame
randomdf=as.data.frame(random)
#rename the coloumns
names(randomdf)<-c("cell", "Temperature", "Urban")
heat<-ggplot(randomdf, aes(x = Urban, y = Temperature))+
geom_point(alpha=2, colour = "#51A0D5")+
labs(x = "Temperature",
y = "Urban index",
title = "Manchester urban and temperature relationship")+
geom_smooth(method='lm', se=FALSE)+
theme_classic()+
theme(plot.title = element_text(hjust = 0.5))
# interactive plot
ggplotly(heat)How about plotting the whole dataset rather than a random subset…
computeddatadf<-as.data.frame(computeddata)
names(computeddatadf)<-c("Temperature", "Urban")
hexbins <- ggplot(computeddatadf, aes(x=Urban, y=Temperature) ) +
geom_hex(bins=100, na.rm=TRUE) +
labs(fill = "Count per bin") +
geom_smooth(method='lm', se=FALSE, size=0.6) +
theme_bw()
ggplotly(hexbins)To see if our varaibles are related let’s run some basic correlation
stat.sig=cor.test(computeddatadf$Temperature, computeddatadf$Urban, use = "complete.obs",
method = c("pearson"))
stat.sig##
## Pearson's product-moment correlation
##
## data: computeddatadf$Temperature and computeddatadf$Urban
## t = 393.67, df = 198268, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.6598805 0.6648217
## sample estimates:
## cor
## 0.6623583
Avdan, U. and Jovanovska, G., 2016. Algorithm for automated mapping of land surface temperature using LANDSAT 8 satellite data. Journal of Sensors, 2016.
Guha, S., Govil, H., Dey, A. and Gill, N., 2018. Analytical study of land surface temperature with NDVI and NDBI using Landsat 8 OLI and TIRS data in Florence and Naples city, Italy. European Journal of Remote Sensing, 51(1), pp.667-678.
Weng, Q., Lu, D. and Schubring, J., 2004. Estimation of land surface temperature–vegetation abundance relationship for urban heat island studies. Remote sensing of Environment, 89(4), pp.467-483.
Young, N.E., Anderson, R.S., Chignell, S.M., Vorster, A.G., Lawrence, R. and Evangelista, P.H., 2017. A survival guide to Landsat preprocessing. Ecology, 98(4), pp.920-932.
Zha, Y., Gao, J. and Ni, S., 2003. Use of normalized difference built-up index in automatically mapping urban areas from TM imagery. International journal of remote sensing, 24(3), pp.583-594.